“The problem of pattern and scale is the central problem in ecology. There is no single natural scale at which ecological phenomena should be studied.” - Simon Levin 1992
In this lab we will be studying how different scales (resolution and extent) might influence patterns in our data and change the answers to our research questions. As reflected in the quote from Si Levin, scale is a central issue to how we study nature.
Here, we will focus on extent and grain. Extent is the size of the area being considered and in ecological studies can vary from a small 1-m2 plot to the entire planet. Grain refers to the size of the smallest sampling unit considered, which in spatial analysis usually refers to the size of pixels. Grain is often used interchangeably with ‘resolution’ (e.g., “we used 30-meter resolution data.” for our analysis). Aside from explicit considerations of scale, I hope that this code is useful in your data management and overall analysis too.
Some scientists, especially climate scientists, may “downscale” data from very coarse resolution to finer resolution using models relating variables like temperature to elevation. In other words, the grain (resolution) of global climate models are very coarse (i.e., large pixels), but data are downscaled to decrease the resolution to make the data useable for ecological or natural resource studies. See example for Alaska below.
This lab will not focus on downscaling, but if you work with climate data - especially future climate predictions - you should understand some of the methods and issues related to downscaling data. While we will not conduct any downscaling in class, we will adjust the grain (resolution) and extent of data.
Two very common grains of spatial data are 30-meters (pixels are 30 meters by 30 meters, or about the size of the infield of a baseball diamond) or 1-km. Historically, the grain of the data increased with the extent (i.e., the larger the study area, the coarser the resolution of data). This was mostly due to computer processing limitations, but this will increasingly not be an issue. Even today, scientists work with and publish 30-meter data for the entire planet. You will find yourself in need of adjusting the resolution of data so that all data share the same resolution, to accommodate an analysis conducted at large extents, due to limitations in computer processing or time, etc.
Coarsening the resolution of raster data (upscaling): resample and aggregate.
Two common methods for coarsening your grain (or enlarging your resolution) are resampling or aggregating. Both of these methods are available in ArcGIS Pro and in the terra package in R.
If you are only interested in coarsening your data, you would probably use aggregate(). If you want your data to match the resolution of another existing dataset, you would probably use resample.
Both functions require that you decide what kind of operation you want to use to aggregate or resample from smaller pixels to larger pixels. The most important consideration in my opinion is whether the data are categorical or quantitative. To coarsen quantitative data, you will probably be using “bilinear interpolation” or “mean” or some other function that calculates a summary value surrounding the center of a new pixel. This results in a dataset with coarser resolution that represents the central tendency of pixels. To coarsen categorical data, you will most likely be using a “nearest neighbor” or “modal” operation. It does not make sense to calculate the mean of a categorical or nominal variable.
The cartoon below show how the finer resolution input raster can be “coarsened” to a new resolution with some operation (e.g., mean, mode, max, nearest neighbor).
Nearest neighbor method:
Calculating the average (mean) to coarsen the resolution:
Using the maximum value:
Exercise 1: Does the landscape composition of the Greater Yellowstone Ecosystem vary with different resolutions?
We will investigate how the composition of the landscape of the GYE might depend on the resolution of the data. You will learn to “upscale” categorical raster data using the aggregate function in terra with the “modal” option. “Modal” means that aggregate will take the mode (i.e., the most common value) of the finer resolution data and assign that mode to the coarser resolution. Our research question is: How does the number of land cover classes and their relative abundance change with resolution? If you are assessing the total amount of habitat for some species, you might want to know how sensitive your analysis could be to adjustments of resolution.
First, load the terra and tidyverse libraries. If these libraries are not loaded onto your computer, use install.packages(c("terra", "tidyverse") first.
library(terra) # for working with spatial data
terra 1.8.60
library(tidyverse) # for data wrangling and plotting
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::extract() masks terra::extract()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Then, bring in the NLCD data from the GYE and plot the data, which should return a nice high resolution map of land cover for the Greater Yellowstone Ecosystem.
nlcd_gye <-rast("Z:/Teaching/BIOE515_LandEcol_Man_Fall2025/Labs/Lab 3/Data/NLCD_2024_GYE.tif")# look at the data plot(nlcd_gye)
Then, use res() to check the resolution of the data and ncell() to see how many total pixels are in the dataset.
### Analysis on 30-m data (base)res(nlcd_gye) # double check the resolution
[1] 30 30
ncell(nlcd_gye) # how many grid cells (pixels) are there?
[1] 152896464
Next, we will create a simple table of land cover types, number of pixels, and the percent of the total GYE that is in each land cover. There may be a better way to do this in R, but I will use freq to count the number of pixels in each land cover type. Then, I will cats to return the attribute table with the land cover names.
freq(nlcd_gye) # this provide the count (number of pixels per class)
We need to join these two tables together. Notice the relationship between “value” and “Value_1”; these are both unique identifiers for the rows, which allows us to “join” the table created in cat with the table created in freq.
First, we have to put those tables in a dataframe object, then use left_join() to merge the two dataframes. This allows us to create an easier-to-use table that has the NLCD land cover class with the count of pixels for that class.
NOTE: I like to use “pipes” in my code, which look like this %>%. This makes your code more readable, in my experience. In the first case below, the code can be read as “take nlcd_gye calculate the frequency using freq and then turn it into a drataframe. You could do the exact same thing like this as.data.frame(freq(nlcd_gye)), but I think”piping” makes things clearer by avoiding the “nested” code.
# create the two dataframescount30m <- nlcd_gye %>%freq() %>%as.data.frame() table30m <- nlcd_gye %>%cats() %>%as.data.frame() # join them noting that "Value_1" in table30m is the same variable as "value" in the count30mtable30m %>%left_join(count30m, by =c("Value_1"="value"))
Let’s clean up that table by selecting only the variables of interest. Recall from Lab 2 that the relative proportion of different ecological types or land cover classes is a straightforward way to assess the composition of landscapes.
table30m %>%left_join(count30m, by =c("Value_1"="value")) %>%mutate(pct = count/sum(count)*100) %>% dplyr::select(NLCD_Land, pct)
NLCD_Land pct
1 Open Water 1.22109713
2 Perennial Ice/Snow 0.06838861
3 Developed, Open Space 0.72091677
4 Developed, Low Intensity 0.69909560
5 Developed, Medium Intensity 0.10654572
6 Developed, High Intensity 0.01048668
7 Barren Land 2.53453041
8 Deciduous Forest 0.71362800
9 Evergreen Forest 35.35371161
10 Mixed Forest 0.01024038
11 Shrub/Scrub 42.61965866
12 Grassland/Herbaceous 7.78868059
13 Pasture/Hay 1.59363379
14 Cultivated Crops 3.35021096
15 Woody Wetlands 1.36237145
16 Emergent Herbaceous Wetlands 1.84680364
Now that we’ve assessed the composition of the landscape using the 30-meter data, let’s use aggregaget() to coarsen the data (i.e., increase the size of the pixels and decrease the number of pixels) to see how our assessment of landscape composition changes with spatial grain (or resolution). We’ll do this using a factor of 10 (increasing the size of the pixels to 300-meters), 100 (pixels = 3000-meters), and 1000 (pixels = 30,000-meters). We will use the modal function, which uses the mode (most common value or type) to assign the new pixels a land cover class.
nlcd_gye10 <-aggregate(nlcd_gye, 10, fun ="modal", na.rm =TRUE)
nlcd_gye100 <-aggregate(nlcd_gye, 100, fun ="modal", na.rm =TRUE)nlcd_gye1000 <-aggregate(nlcd_gye, 1000, fun ="modal", na.rm =TRUE)
Let’s look at the land cover maps with different resolutions.
# this tells R you want to create a multi-figure plot with 2 rows and 2 columnspar(mfrow =c(2, 2)) # Now, plot all four maps. plot(nlcd_gye)plot(nlcd_gye10)plot(nlcd_gye100)plot(nlcd_gye1000)
TASK 1 (10 points).
Now, repeat the steps above that we used for the original 30-meter dataset and report the (1) resolution, (2) number of pixels, (3) percent of each land cover class in the GYE based on the different resolutions. See if you can create a clean table where land cover classes are the rows and your columns are the percentages for the 4 different resolutions? You can report total pixels in a Table caption or in parentheses in the column header.
Your goal here is to make a table suitable for a paper clearly organized and with an informative caption. I am a fan of “stand alone” captions for figures and tables. These are captions that describe and interpret the results. Consider the consequences of coarsening the resolution. When would the coarsest map of land cover be useful? When do you absolutely need the highest resolution data? Add some of these insights to your figure caption.
Exercise 2: Does the relationship between bird diversity and net primary productivity vary with extent and grain of analysis?
A long history of ecological research has explored the mechanisms explaining why some places host more species than others. One compelling line of research suggests that biodiversity (the number of species in an area) should be positively related to ecosystem productivity (the amount of carbon captured through photosynthesis). Some ecologists have suggested that more energy captured in highly productive areas can be divided among more species, compared to ecosystems of lower productivity. We’re going to use data on bird species richness and net primary productivity (NPP) to investigate these patterns for the GYE and examine whether the resolution and extent of the spatial data change our results and interpretations.
Make sure you have the terra and tidyverse packages loaded and bring the data into R.
library(terra)library(tidyverse)# now just use filenames to load the data for the bird richness and nppbird <-rast("Z:/Teaching/BIOE515_LandEcol_Man_Fall2025/Labs/Lab 3/Data/bird_richness_GYE.tif")npp <-rast("Z:/Teaching/BIOE515_LandEcol_Man_Fall2025/Labs/Lab 3/Data/NPP2016_GYE.tif")# let's also bring in elevation and data on ecoregionselev <-rast("Z:/Teaching/BIOE515_LandEcol_Man_Fall2025/Labs/Lab 3/Data/elevation_GYE.tif")ecoreg <-vect("Z:/Teaching/BIOE515_LandEcol_Man_Fall2025/Labs/Lab 3/Data/us_eco_l4_GYE.shp")
Notice that bird, npp, and elev (bird richness, NPP, and elevation) area all raster files (geotiffs, .tif), but ecoreg (ecoregions) is a vector (shapefile, .shp). Go ahead and plot ecoreg, which will show you the level 4 ecoregions from the EPA.
plot(ecoreg)
We want to be able to use these data in a raster stack (which stores many raster layers in one R object). Raster stacks are really awesome, but you have to do some adjustments of your data before R will let you stack rasters. The adjustments include making sure that the crs (coordinate reference systems) and the extents are all the same. This takes some work, but it is worth it. In our case, let’s turn those ecoregions into a raster dataset. We’ll use rasterize, which allows you to match the raster resolution and extent to an existing raster. In this case, we’ll use elevation as our standard raster and assign each pixel the variable “US_L4CODE” (which is the unique code for level 4 ecoregions). Rasterize and then plot the new raster version of ecoregions for the GYE.
ecor_gyeRast <-rasterize(ecoreg, elev, field ="US_L4CODE") plot(ecor_gyeRast)
We will stack these rasters up using the well-used way to combine or concatenate c() data in R. Remove the # and run this.
# stack <- c(elev, bird, npp, ecor_gyeRast)
This should throw an error indicating that the extents do not match. Before we can stack elevation, bird richness, NPP, and ecoregion data, we’ll need to check their coordinate reference systems (crs) and possibly project the data into a common crs. ArcGIS Pro allows you to be really sloppy with this and projects layers “on the fly” without making you reproject the data. We will first check to see if the crs and the extent (ext) of each layer matches using == (are they equal?). We’ll use elevation as our base map.
crs(elev) ==crs(bird)
[1] FALSE
crs(elev) ==crs(npp)
[1] TRUE
crs(elev) ==crs(ecor_gyeRast)
[1] TRUE
ext(elev) ==ext(bird)
[1] FALSE
ext(elev) ==ext(npp)
[1] FALSE
ext(elev) ==ext(ecor_gyeRast)
[1] TRUE
It looks like the elevation and bird richness data do not have the same crs and NPP and elevation don’t have the same extent. We will use project to match bird richness with elevation and NPP with elevation. project allows you to specific how to resample the data so that the grids match perfectly. We’ll use method = "bilinear", which stands for bilinear interpolation. This is a bit like averaging adjacent pixels to resample, but bilinear interpolation calculate a weighted average based on the locations of 9 closest adjacent pixels. This is a very common resampling method. Notice the name I’m giving the new objects includes PRJ (this tells me that this one has been projected). This step might take some time.
Now we are ready to so some analysis. If you have enough RAM, you can turn this entire stack into a dataframe using something like stackDF <- stack %>% as.data.frame(xy = T) that will make each pixel location a row and include longitude and longitude (from xy = T) and each variable (elevation, bird richness, npp, and ecoregion code) as a column. This can be powerful, but it is often prudent to take a sample of your data and then do analyses. Here, we will use spatSample in terra to randomly select 5000 pixel locations and turn these into a dataframe.
# take 5,000 random samples ignoring NAs, turn it into a dataframe, and add the coordinates (xy = TRUE)stackSamp <-spatSample(stack, size =5000, method ="random", na.rm =TRUE, as.data.frame =TRUE, xy =TRUE) # look at the top of the dataframehead(stackSamp)
Now, let’s look the relationship between bird richness and NPP using a simple scatterplot. Notice that I am dividing NPP by 10000. I need to make this adjustment to ensure that the units are correct.
# Make a scatterplot: elevation vs. NPPplot(stackSamp$npp/10000, stackSamp$bird_richness,xlab ="NPP (kg C/m²/yr)",ylab ="Bird richness")
Let’s make this a nicer looking graph in ggplot.
stackSamp %>%ggplot(aes(x = npp/10000, y = bird_richness)) +geom_point(alpha =0.15) +theme_bw() +labs(x ="NPP (kg C/m²/yr)", y ="Bird richness")
Interpret the relationship between NPP and bird richness across the GYE. Do you think this relationship would change if we investigated these patterns within different ecoregions? Let’s do this by adding ecoregion as a “facet” using facet_wrap. This kind of data stratification (grouping your data by meaningful categories) is very easy with raster stacks in R and ggplot2. This is a simple way of grouping your data and subsampling a smaller geographic area. It is an ecologically meaningful way to evaluate whether your overall patterns change with extent.
stackSamp %>%ggplot(aes(x = npp/10000, y = bird_richness)) +geom_point(alpha =0.15) +facet_wrap(~ ecoregion) +theme_bw() +labs(x ="NPP (kg C/m²/yr)", y ="Bird richness")
There’s a lot to consider here, but the patterns do look different for different ecoregions. Let’s do the same thing but using elevation as the facet (i.e., split the data into groups based on their elevation). To to this, we need to turn elevation into an ordinal variable. We will do this using ntile, which is a very easy way to ‘bin’ data into quantiles. In this case, let’s use 10 quantiles (i.e., deciles, which breaks the data into 10 equal-sized grounds). This takes the stackSamp object and adds a variable named elev_band that is the output of ntile. mutate is a nice function in dplyr that allows you to add a variable to a dataframe. An alternative might be to stackSamp$elev_band <- ntile(stackSampe$elevation, 10).
Now, let’s replot NPP and bird richness by elevation bands.
stackSamp %>%ggplot(aes(x = npp, y = bird_richness)) +geom_point(alpha =0.15) +facet_wrap(~ elev_band) +theme_bw() +labs(x ="NPP (kg C/m²/yr)", y ="Bird richness")
Does your ecological interpretation of these patterns change with elevation?
Now, let’s look at how changing the resolution (grain) of the data might affect these patterns. Let’s drop ecoregion so that all variables are quantitative (and not nominal like ecoregion). We’ll do this using the bracket notation of R. We’ll create a new stack called stack1 that selects the 1st through 3rd layers in the original stack. We can do this using stack[[1:3]]. This method of subsetting data can be very helpful for data processing.
stack1 <- stack[[1:3]]# look to make sure we subset the layers we meant tonames(stack1)
[1] "elevation" "bird_richness" "npp"
plot(stack1)
Histograms are a really powerful and important way to visualize the distribution of the data. Histograms essentially provide a visual display of the frequency of observations for the range of values in your data. Histograms are patterns that can tell us something about processes. Are the data heavily skewed or rather normal? What processes would give rise to a highly skewed dataset and what processes would give rise to normal data? Those questions are beyond the scope of this lab, but looking at the histograms of your data can be very revealing and is essential for doing some parametric statistical tests.
We will look at the histograms for elevation, bird richness, and NPP data and look to see how they might change with the resolution of the spatial data. We also want to look at some summary statistics like the minimum value, the maximum value, the mean, median, and variance.
# first we'll plot the histogram hist(stack1)
Warning: [hist] a sample of 1% of the cells was used (of which 33% was NA)
Warning: [hist] a sample of 1% of the cells was used (of which 33% was NA)
Warning: [hist] a sample of 1% of the cells was used (of which 37% was NA)
# look at the summary of valuessummary(stack1)
Warning: [summary] used a sample
elevation bird_richness npp
Min. :1214 Min. : 1.00 Min. : 0
1st Qu.:1949 1st Qu.: 53.42 1st Qu.: 1866
Median :2280 Median : 58.82 Median : 2855
Mean :2295 Mean : 57.07 Mean : 3421
3rd Qu.:2613 3rd Qu.: 65.12 3rd Qu.: 5021
Max. :4131 Max. :126.82 Max. :11460
NA's :33476 NA's :33630 NA's :37262
# calculate the varianceglobal(stack1, var, na.rm =TRUE)
Now, let’s coarsen the resolution of the spatial data (increasing the size of pixels) using the aggregage function. We’ll start with the factor of 10 (fact = 10), which will increase the 30-meter data to 300-meter, by taking the mean of the smaller pixels using fun = mean, and ignore the NAs in the data using na.rm=TRUE. We’ll call this new stack stk_10 to denote the factor.
stk_10 <-aggregate(stack1, fact =10, fun = mean, na.rm =TRUE)
Now, let’s look at the maps, histograms, and summary statistics of the coarser data and compare these to the summary stats using the 30-meter data.
# mapsplot(stk_10)
# histogramshist(stk_10)
Warning: [hist] a sample of 65% of the cells was used (of which 33% was NA)
Warning: [hist] a sample of 65% of the cells was used (of which 33% was NA)
Warning: [hist] a sample of 65% of the cells was used (of which 34% was NA)
# summary statssummary(stk_10)
Warning: [summary] used a sample
elevation bird_richness npp
Min. :1207 Min. : 6.22 Min. : 0
1st Qu.:1949 1st Qu.: 54.05 1st Qu.: 1979
Median :2279 Median : 59.40 Median : 3046
Mean :2294 Mean : 57.06 Mean : 3373
3rd Qu.:2613 3rd Qu.: 64.97 3rd Qu.: 4723
Max. :4069 Max. :117.72 Max. :11304
NA's :33311 NA's :33437 NA's :34100
Now, let’s look to see if the relationship between NPP and bird richness changes with this change in grain.
plot(stk_10$npp, stk_10$bird_richness)
Warning: [plot] plot used a sample of 6.6% of the cells. You can use "maxcell"
to increase the sample)
Did the relationship change?
TASK 2 (30 points).
Change the resolution of the elevation, NPP, and bird diversity data using 2 different factors (not the ones we already used) and investigate the summary statistics, histograms, and relationship between NPP and bird richness. Organize your figures and tables, interpret the results, and write a short abstract on your findings. Please turn in your abstract (no more than 250 words, following the format in Landscape Ecology), as well as any figures or tables you made.